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ABSTRACT 

Experimentwise error rates can rapidly inflate when 
researchers use multiple univariate tests. Both (a) ANOVA post 
hoc and (b) multivariate methods incorporate a correction for 
experimentwise error. Researchers ought to understand 
experimentwise error if they are to understand (a) what post hoc 
test really do and (b) an important rationale for multivariate 
methods . 
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A REVIEW OF EXPERIMENTWISE TYPE I ERROR: 

IMPLICATIONS FOR UNIVARIATE POST HOC AND FOR MUTIVARIATE TESTING 
Researchers are wary of making a Type I error. In order to 
guard against doing that, researchers set alpha to be small. 
However, some researchers, focus only on "testwise" alpha, and 
are unaware of the "experimentwise" alpha and the iimportance of 
not inflating "experimentwise" Type I error rates. This paper 
reviews experimentwise Type I error. The concept is 
fundamentally important in two respects. First, ANOVA post hoc 
tests implicitly incorporate a correction for experiemtnwise 
error; if this correction is not understood, the researcher does 
not understand post hoc tests themselves. Second, experimentwise 
error concerns are one reason why multivariate tests are almost 
always vital in educational research (Fish, 1988; Thompson, 

1999) , so researchers ought to understand experimentwise error if 
they are to understand an important rationale for multivariate 
methods . 

Experimentwise Error 

Researchers are cognizant of the possibility of rejecting a 
null hypothesis (H 0 ) even when the H 0 is true. This is called a 

"testwise" Type I error. Researchers set an alpha (a) level a 

priori at a small near-zero value to protect against testwise 
Type I errors. If the alpha level is set at .01 of statistical 

significance, one percent of the time the null will be falsely 

rejected. In this case, the null is rejected even though the 
null may be true in the population. 
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Most researchers are familiar with "testwise" alpha (a TW ) . 
However, while "testwise" alpha refers to the probability of 
making a Type I error for a given hypothesis test , 

"experimentwise" (or " f amilywise" -- see Maxwell, 1992, p. 138) 
error rate refers to the probability of having made a Type I 
error anywhere within a set of hypothesis tests (Thompson, 1994) . 
"Experimentwise" error rate inflates when a number of hypotheses 
are tested (e.g., two or more dependent variables) at the same 
alpha level within a given study (Love, 1988) . 

"Experimentwise" error rate equals "testwise" error rate 
when only one hypothesis is tested for a given group of people in 
a study. However, when more than one hypothesis is being tested 
in a given study with only one sample, the two error rates may 
not be equal (Thompson, 1994) . This occurs as Type I errors from 
each individual tested hypothesis build off each other, causing a 
highly inflated experimentwise error rate. Huberty and Morris 
(1989, p. 306) referred to this as "probability pyramiding." 

Given the number of hypotheses being tested, the inflation of 
experimentwise error rates can be quite serious, as emphasized by 
Morrow and Frankiewicz (1979) . 

Experimentwise and testwise error rates are equal given the 
presence of multiple hypothesis tests (e.g., two or more 
dependent variables) in a single sample study only if the 
hypotheses (or the dependent variables) are perfectly correlated 
(or independent). This is so by reason that, for example, when 
one has perfectly correlated hypotheses, one actually is still 
only testing a single hypothesis. Therefore, it can be said that 
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two factors effect the inflation of experimentwise Type I error: 
(a) the number of hypotheses tested using a single sample of 
data, and (b) the degree of correlation among the dependent 
variables or the hypotheses tested (Thompson, 1994). 

Bonferroni Formula for 

"Experimentwise" error rate inflation is at its maximum when 
multiple dependent variables (e.g., multiple hypothesis tests) in 
a single sample study are perfectly uncorrelated (Fish, 1988) . 
When this occurs, the experimentwise error (cxew) rate can be 

calculated. This is done using what is called the Bonferroni 
inequality (Love, 1988): 

K 

C^EW = 1 ( 1 “ OCtw) / 

where k is the number of perfectly uncorrelated hypotheses or 
variablesbeing tested at a given testwise alpha level (octw) • 

For example, if four perfectly uncorrelated hypotheses (or 
dependent variables) are tested using data from a single sample, 
each at the a tw = .01 level of statistical significance, the 

experimentwise Type I error rate will be: 

&EW = 1 ( 1 “ 0t TW ) 

= i - (i - .on 4 

= 1 - ( .99 ) 4 

= 1 - ( . 99 ( . 99) ( . 99) ( . 99) ) 

= 1 - .960596 

a EW = . 039404 . 

Thus, for a study testing four perfectly uncorrelated 
dependent variables, each at the a TW = .01 level of statistical 

significance, the probability is .039404 (or 3.9404%) that one or 
more null hypotheses will be incorrectly rejected within the 
study. However, knowing this will not inform the researcher as 
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to which one or more of the statistically significant hypotheses 
is a Type I error. Table 1 provides an illustration of these 
calculations for several ot TW levels. This table also illustrates 
how quickly a EW can become inflated. 

Witte (1985) explains the two error rates using an 
intuitively appealing example involving a coin toss. If the toss 
of heads is equated with a Type I error, and if a coin is tossed 
only once, then the probability of a head on the one toss (a^) , 
and of at least one head within a set (a EW ) consisting of one 

toss, will both equal 50%. 

If the coin is tossed three times, the "testwise" 
probability of a head on each toss is still 50%, i.e., a-xw = -50 

(not .05). The Bonferroni inequality is a literal fit to this 
example situation (i.e., that is, a literal analogy), because the 
coin's behavior on each flip is literally uncorrelated with the 
coin's behavior on previous flips. In other words, the coin does 
not alter its behavior on any given flip as a result of its 

behavior on any previous flip. 

Thus, the "experimentwise" probability (a EW ) that there will 

be at least one head in the whole set of three flips will be 
exactly : 

K 

^EW = 1 _ ( 1 CXjm) 

= 1 - (1 - .50) 3 

= 1 - { .50 ) 3 

= 1 - ( .50 ( .50) ( .50) ) 

= 1 - { . 2500 ( . 50) ) 

= 1 - .125000 

a EW = .875000. 

Table 2 illustrates these concepts more concretely. In the 
table are listed eight equally likely outcomes for sets of three 
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coin flips. Of the eight sets of three flips, seven involve one 
or more Type I error, defined in this example as a heads. 
According to the Bonferroni inequality, 7/8 equals .875000, as 
expected. 

As stated earlier, the above example is a literal fit for 
the Bonferroni inequality because the behavior of the coin on a 
given flip is uncorrelated with the behavior of the coin on any 
other flip. The exact a EW can be determined using the Bonferroni 

inequality formula if the hypotheses or variables are perfectly 
uncorrelated. This formula is not necessary when the hypotheses 
are perfectly correlated because the a EW and the a TW equal each 

other . 

However, in most studies hypotheses are neither perfectly 
uncorrelated nor perfectly correlated, and rather are partially 
correlated. For such studies, the actual experimentwise error 
rate will range somewhere between the computed experimentwise 
error rate (see above) and the testwise error rate, but may never 
really be known (Fish, 1988; Love, 1988; Morrorw & Frankiewicz, 
1979) . 

Also, the a EW inflation can be quite severe given the number 
of hypotheses tested and the level of correlation. Therefore, 
the power to reject can be low (Olejnik, Li, Supattathum, & 
Huberty, 1997) . In other words, with multiple univariate follow- 
up tests at the original a TW level (e.g., .05), the a EW is 

inflated to statistical significance even if no statistical 
significance is found anywhere in the study. In order to 
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compensate for this, researchers apply a "correction. M This is 
called the "Bonferroni correction." 

Bonf erroni Correction 

The Bonferroni correction compensates for the inflation by 
dividing the original otxw by the number of k hypotheses in the 
study yielding a new otxw* (Maxwell , 1992; Thompson, 1994) : 

— TW 

ot TW * = k 

Each individual post hoc test then utilizes the otxw* in order to 
maintain the ot E w at an appropriate level. Table 3 illustrates 

how the Bonferroni correction is utilized in order to maintain 
the a E w at an appropriate level. However, this table also 

illustrates how the use of the Bonferroni correction has the 
potential for severe loss in power (Olejnik, Li, Supattathum, & 
Huberty, 1997) . 

Post Hoc Analysis 

After using an ANOVA omnibus test to analyze overall 
differences in a multi -group study with more than two groups, 
many researchers use "post hoc" (also called "a posteriori," 
"unplanned," or "unfocused") tests to determine which group means 
differ for each set of pairs or combinations of groups. All 
comparisons/contrasts only test whether exactly two means are 
equal. There are two kinds of comparisons: simple and complex. 

Although all contrasts test the equality of exactly two means, 
simple and complex contrasts differ as regards the permissible 
ways in which the two means are created. Put simply, "simple" 
contrasts compare the dependent variable means of two groups 
using the existing levels of a way, without any combinations of 
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any levels. "Complex" contrasts, on the other hand, include all 
possible "simple" contrasts, but also include means computed by 
aggregating data across levels of the way. 

For example, let's presume that a researcher did a one-way 
three-level ANOVA in which there were 10 people in each of the 
three groups of car owners: (a) Ford, (b) Nissan and (c) Rolls 

Royce . The dependent variable might be satisfaction with one's 
car. For this design three "simple" contrasts of mean levels of 
satisfaction are possible: 

Mford (n = 10) = Mnissan (n = 10) ; 

Mford ( n = 1 0 ) = Mrolls ( n = 10); and 

Mnissan ( n = 10) = Mrolls ( n = 10). 

The "complex" contrasts include these simple contrasts, plus 
the following three "uniquely complex" contrasts: 

Mford (n = 10) = Mnissan or rolls (n = 20) ; 

Mnissan (n = 10) = Mford or rolls (n = 20) ; and 

Mrolls (n = 10) = Mford or nissan (n = 20) . 

Table 4 illustrates these combinations for both three- and four- 
level one-way ANOVA problems. As Table 4 makes clear, as the 
number of levels gets larger, the number of simple contrasts gets 
larger, but the number of complex contrasts gets exponentially 
larger . 

For each comparison, simple or complex, there are specific 
post hoc tests used. For simple comparisons the Tukey method, 
also called the HSD (honestly significant difference) test, is 
often used. For complex comparisons the Scheffe method is often 
used (Hinkle, Wiersman, & Jurs, 1998). Each of these method 
utilizes an analogue to the Bonferroni correction in order to 
maintain the a EW at the a priori a level . 
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Tukey 

The Tukey method is likely the most recommended and used 
procedure for controlling Type I error when making simple 
comparisons. The original Tukey method is based on Studentized 
range statistics, which takes into account the number of means 
being compared, adjusting for the total number of tests to make 
all simple comparisons. Later revisions of the Tukey method have 
demonstrated its robustness to violations of normality and 
homogeneity assumptions (Barnette, 1998) . The Tukey method is 
also relatively insensitive to skewness. The Tukey method is not 
affected too much by many varied conditions. The exception to 
that is with the variability of the population means. Keselman 
(1976) found that the Tukey method is more powerful for the 
maximum variability of the population means. This is logical 
given that under this condition the magnitude of simple 
comparisons is largest. However, with larger sample sizes, the 
Tukey tends to lose relative power. 

Scheffe 

The Scheffe method is designed to analyze all possible 
comparisons (Sato, 1996) . Therefore, the Scheffe method is used 
for complex or multiple comparisons. The Scheffe* s infinite 
intersectional nature is its greatest strength and its greatest 
weakness. It is strong because it can analyze all possible 
comparisons. Klockars and Hancock (1998), however, assert that 
researchers are not always interested in many of the comparisons 
Scheffe makes. Because it is designed to test so many multiple 
comparisons, the Scheffe method is extremely conservative. The 
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Scheffe methods suffers loss of power for some researchers 
because it is so conservative (Sato, 1996) . 

Multivariate Methods 

Multivariate methods are designed for multiple outcome 
variables. As Huberty and Morris (1989) noted, multivariate 
methods ask, "Are there any overall effects present?" This 
questioning, or this philosophy, best honors the reality from 
which data are collected. That is, if data are collected from 
samples upon which there are many influences, or variables, then 
it is logical to use a statistical method that is designed to 
take those variables into account simultaneously (Thompson, 

1994) . 

Because multivariate methods are designed for multiple 
outcome variables, multivariate methods require only one omnibus 
test to determine if any differences exist. This is in contrast 
to univariate methods, which require many tests, thus increasing 
the likelihood of making erroneous decisions. For this reason 
alone, multivariate methods should be used when multiple outcome 
variables are of concern. 

Summary 

Although many researchers are familiar with "testwise" 
alpha, "experimentwise" Type I error rates are also important, 
and must be considered in many research situations. Testing 
multiple hypotheses with a single sample of data can radically 
inflate the "experimentwise" Type I error rate. 

The present paper has explained how this inflation can be 
avoided in various research situations. First, it was explained 
that ANOVA post hoc tests implicitly incorporate a hidden analog 
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of the "Bonferroni correction" to avoid Type I error rate 
inflation. Second, it was noted that multivariate statistics are 
frequently employed by researchers to control "experimentwise" 
errors that would otherwise occur by conducting several ANOVA's 
or regression analyses with a single sample of data. 
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Table 2 

All Possible Families of Outcomes 
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Table 3 



Experimentwise Error Rate Without and With 



The Application of the Bonferroni Correction 
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Table 4 
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